Debug about incremental indexing and also MCP by Dennis-yxchen · Pull Request #26 · HKUDS/FastCode

Dennis-yxchen · 2026-03-22T07:21:28Z

No description provided.

Copilot

Pull request overview

This PR tightens incremental indexing and MCP server behaviors to avoid mixed in-memory artifacts, improve repo naming stability for local paths, and eliminate stdout print() noise (important for MCP stdio).

Changes:

Add stable local-path repo naming (get_repo_name_from_path) with legacy index reuse and collision detection in MCP flows.
Harden indexing/caching flows: rebuild vector/graph state cleanly per repo, improve multi-repo cache loading correctness, and add incremental reindex safeguards (embedding dimension checks, effective repo_root, overview refresh, manifest persistence).
Add a comprehensive regression test suite covering incremental reindexing + MCP behaviors and verifying no print() calls in critical paths.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
tests/test_incremental_indexing_regressions.py	New regression tests for incremental reindexing, MCP fallback logic, repo naming stability, and stdout safety.
mcp_server.py	Canonical repo naming for local paths, clean full-reindex helper, incremental fallback logic, stricter in-memory load validation, and enable incremental refresh for lookup tools.
fastcode/utils.py	Introduces `get_repo_name_from_path()` for stable, disambiguated local repo identifiers.
fastcode/loader.py	Uses `get_repo_name_from_path()` to name workspace copies consistently and support reuse of workspace copies.
fastcode/main.py	Ensures clean per-repo in-memory indexing, fresh graph builder usage, safer cache-load rebuilds, multi-repo cache hardening, and more robust incremental reindexing.
fastcode/vector_store.py	Replaces stdout `print()` with logger debug output in repo overview search.
fastcode/retriever.py	Replaces stdout `print()` with structured logging during repo selection.
fastcode/query_processor.py	Removes stdout `print()` of LLM response in favor of debug logging.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T07:25:55Z

+            # failures cannot leave stale repos marked as loaded.
+            self.loaded_repositories = {}
+            self.repo_indexed = False
+            self.repo_loaded = False
+            self.multi_repo_mode = False
+            if hasattr(self.retriever, "current_loaded_repos"):
+                self.retriever.current_loaded_repos = None


In _load_multi_repo_cache, the in-memory merge reinitializes self.vector_store and resets self.retriever.current_loaded_repos, but it does not clear HybridRetriever's filtered caches (filtered_vector_store / filtered_bm25*). If a prior request loaded a filtered subset, subsequent queries without an explicit repo_filter can still use the stale filtered_vector_store (because _semantic_search prefers it when non-empty), returning incomplete results after a reload. Clear these filtered caches whenever rebuilding the merged in-memory index (either by calling invalidate_in_memory_state() here or explicitly resetting the filtered fields).

Suggested change

# failures cannot leave stale repos marked as loaded.

self.loaded_repositories = {}

self.repo_indexed = False

self.repo_loaded = False

self.multi_repo_mode = False

if hasattr(self.retriever, "current_loaded_repos"):

self.retriever.current_loaded_repos = None

# failures cannot leave stale repos marked as loaded or stale

# filtered caches on the retriever.

self.loaded_repositories = {}

self.repo_indexed = False

self.repo_loaded = False

self.multi_repo_mode = False

# Reset retriever state that depends on the in-memory index

if hasattr(self.retriever, "current_loaded_repos"):

self.retriever.current_loaded_repos = None

# Clear any filtered vector store used for subset queries

if hasattr(self.retriever, "filtered_vector_store"):

self.retriever.filtered_vector_store = None

# Clear all BM25-based filtered caches (e.g., filtered_bm25_corpus,

# filtered_bm25_index, etc.) so subsequent queries use the fresh index.

for attr_name in dir(self.retriever):

if attr_name.startswith("filtered_bm25"):

setattr(self.retriever, attr_name, None)

Copilot · 2026-03-22T07:25:56Z

+    def strip_repo_hash_suffix(repo_name: str) -> Optional[str]:
+        base_name, sep, suffix = repo_name.rpartition("-")
+        if not sep or len(suffix) != 8:
+            return None
+        if not all(ch in "0123456789abcdef" for ch in suffix.lower()):
+            return None
+        return base_name or None
+


strip_repo_hash_suffix() is implemented twice in this file (once inside _repo_name_from_source and again inside _ensure_repos_ready). Duplicating the parsing rules increases the chance these helpers drift over time (e.g., suffix length/format changes) and creates subtle behavior mismatches. Consider extracting a single module-level helper and reusing it from both call sites.

Suggested change

def strip_repo_hash_suffix(repo_name: str) -> Optional[str]:

base_name, sep, suffix = repo_name.rpartition("-")

if not sep or len(suffix) != 8:

return None

if not all(ch in "0123456789abcdef" for ch in suffix.lower()):

return None

return base_name or None

Dennis-yxchen added 3 commits March 21, 2026 00:17

Fix MCP incremental reload handling

cfa3e40

bug fix un review

3f2521b

bug fix

51a3a2b

Dennis-yxchen requested review from Zongwei9888 and Copilot March 22, 2026 07:21

Copilot started reviewing on behalf of Dennis-yxchen March 22, 2026 07:22 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debug about incremental indexing and also MCP#26

Debug about incremental indexing and also MCP#26
Dennis-yxchen wants to merge 3 commits intomainfrom
debug

Dennis-yxchen commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Dennis-yxchen commented Mar 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants